{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pandas实现groupby聚合后不同列数据统计"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "电影评分数据集（UserID，MovieID，Rating，Timestamp）  \n",
    "\n",
    "<b>聚合后单列-单指标统计：每个MovieID的平均评分</b>  \n",
    "df.groupby(\"MovieID\")[\"Rating\"].mean()\n",
    "\n",
    "<b>聚合后单列-多指标统计：每个MoiveID的最高评分、最低评分、平均评分</b>  \n",
    "df.groupby(\"MovieID\")[\"Rating\"].agg(mean=\"mean\", max=\"max\", min=np.min)  \n",
    "df.groupby(\"MovieID\").agg({\"Rating\":['mean', 'max', np.min]})\n",
    "\n",
    "<b>聚合后多列-多指标统计：每个MoiveID的评分人数，最高评分、最低评分、平均评分</b>  \n",
    "df.groupby(\"MovieID\").agg(rating_mean=(\"Rating\", \"mean\"),user_count=(\"UserID\", lambda x : x.nunique())  \n",
    "df.groupby(\"MovieID\").agg({\"Rating\": ['mean', 'min', 'max'],\"UserID\": lambda x :x.nunique()})  \n",
    "df.groupby(\"MovieID\").apply(lambda x: pd.Series({\"min\": x[\"Rating\"].min(), \"mean\": x[\"Rating\"].mean()}))  \n",
    "\n",
    "\n",
    "\n",
    "<b>记忆：</b>agg(新列名=函数)、agg(新列名=(原列名，函数))、agg({\"原列名\"：函数/列表})  \n",
    "agg函数的两种形式，等号代表“把结果赋值给新列”，字典/元组代表“对这个列运用这些函数”\n",
    "\n",
    "\n",
    "\n",
    "官网文档：https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.core.groupby.DataFrameGroupBy.agg.html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 读取数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv(\n",
    "    \"./datas/movielens-1m/ratings.dat\", \n",
    "    sep=\"::\",\n",
    "    engine='python', \n",
    "    names=\"UserID::MovieID::Rating::Timestamp\".split(\"::\")\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>UserID</th>\n",
       "      <th>MovieID</th>\n",
       "      <th>Rating</th>\n",
       "      <th>Timestamp</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1193</td>\n",
       "      <td>5</td>\n",
       "      <td>978300760</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>661</td>\n",
       "      <td>3</td>\n",
       "      <td>978302109</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>914</td>\n",
       "      <td>3</td>\n",
       "      <td>978301968</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   UserID  MovieID  Rating  Timestamp\n",
       "0       1     1193       5  978300760\n",
       "1       1      661       3  978302109\n",
       "2       1      914       3  978301968"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 聚合后单列-单指标统计"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "MovieID\n",
       "1    4.146846\n",
       "2    3.201141\n",
       "3    3.016736\n",
       "4    2.729412\n",
       "5    3.006757\n",
       "Name: Rating, dtype: float64"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 每个MovieID的平均评分\n",
    "result = df.groupby(\"MovieID\")[\"Rating\"].mean()\n",
    "result.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 聚合后单列-多指标统计"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "每个MoiveID的最高评分、最低评分、平均评分"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 方法1：agg函数传入多个结果列名=函数名形式"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mean</th>\n",
       "      <th>max</th>\n",
       "      <th>min</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>MovieID</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4.146846</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3.201141</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.016736</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2.729412</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>3.006757</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             mean  max  min\n",
       "MovieID                    \n",
       "1        4.146846    5    1\n",
       "2        3.201141    5    1\n",
       "3        3.016736    5    1\n",
       "4        2.729412    5    1\n",
       "5        3.006757    5    1"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = df.groupby(\"MovieID\")[\"Rating\"].agg(\n",
    "    mean=\"mean\", max=\"max\", min=np.min\n",
    ")\n",
    "result.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 方法2：agg函数传入字典，key是column名，value是函数列表"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"3\" halign=\"left\">Rating</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>mean</th>\n",
       "      <th>max</th>\n",
       "      <th>amin</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>MovieID</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4.146846</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3.201141</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.016736</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2.729412</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>3.006757</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           Rating         \n",
       "             mean max amin\n",
       "MovieID                   \n",
       "1        4.146846   5    1\n",
       "2        3.201141   5    1\n",
       "3        3.016736   5    1\n",
       "4        2.729412   5    1\n",
       "5        3.006757   5    1"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 每个MoiveID的最高评分、最低评分、平均评分\n",
    "result = df.groupby(\"MovieID\").agg(\n",
    "    {\"Rating\":['mean', 'max', np.min]}\n",
    ")\n",
    "result.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age_mean</th>\n",
       "      <th>age_min</th>\n",
       "      <th>age_max</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>MovieID</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4.146846</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3.201141</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.016736</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2.729412</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>3.006757</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         age_mean  age_min  age_max\n",
       "MovieID                            \n",
       "1        4.146846        5        1\n",
       "2        3.201141        5        1\n",
       "3        3.016736        5        1\n",
       "4        2.729412        5        1\n",
       "5        3.006757        5        1"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result.columns = ['age_mean', 'age_min', 'age_max']\n",
    "result.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 聚合后多列-多指标统计\n",
    "\n",
    "每个MoiveID的评分人数，最高评分、最低评分、平均评分"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 方法1：agg函数传入字典，key是原列名，value是原列名和函数元组"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>rating_mean</th>\n",
       "      <th>rating_min</th>\n",
       "      <th>rating_max</th>\n",
       "      <th>user_count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>MovieID</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4.146846</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>2077</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3.201141</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>701</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.016736</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>478</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2.729412</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>170</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>3.006757</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>296</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         rating_mean  rating_min  rating_max  user_count\n",
       "MovieID                                                 \n",
       "1           4.146846           1           5        2077\n",
       "2           3.201141           1           5         701\n",
       "3           3.016736           1           5         478\n",
       "4           2.729412           1           5         170\n",
       "5           3.006757           1           5         296"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 回忆：agg函数的两种形式，等号代表“把结果赋值给新列”，字典/元组代表“对这个列运用这些函数”\n",
    "result = df.groupby(\"MovieID\").agg(\n",
    "        rating_mean=(\"Rating\", \"mean\"),\n",
    "        rating_min=(\"Rating\", \"min\"),\n",
    "        rating_max=(\"Rating\", \"max\"),\n",
    "        user_count=(\"UserID\", lambda x : x.nunique())\n",
    ")\n",
    "result.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 方法2：agg函数传入字典，key是原列名，value是函数列表\n",
    "\n",
    "统计后是二级索引，需要做索引处理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"3\" halign=\"left\">Rating</th>\n",
       "      <th>UserID</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>mean</th>\n",
       "      <th>min</th>\n",
       "      <th>max</th>\n",
       "      <th>&lt;lambda&gt;</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>MovieID</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4.146846</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>2077</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3.201141</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>701</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.016736</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>478</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2.729412</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>170</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>3.006757</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>296</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           Rating           UserID\n",
       "             mean min max <lambda>\n",
       "MovieID                           \n",
       "1        4.146846   1   5     2077\n",
       "2        3.201141   1   5      701\n",
       "3        3.016736   1   5      478\n",
       "4        2.729412   1   5      170\n",
       "5        3.006757   1   5      296"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = df.groupby(\"MovieID\").agg(\n",
    "    {\n",
    "        \"Rating\": ['mean', 'min', 'max'],\n",
    "        \"UserID\": lambda x :x.nunique()\n",
    "    }\n",
    ")\n",
    "result.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mean</th>\n",
       "      <th>min</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>MovieID</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4.146846</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3.201141</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.016736</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             mean  min  max\n",
       "MovieID                    \n",
       "1        4.146846    1    5\n",
       "2        3.201141    1    5\n",
       "3        3.016736    1    5"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result[\"Rating\"].head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>rating_mean</th>\n",
       "      <th>rating_min</th>\n",
       "      <th>rating_max</th>\n",
       "      <th>user_count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>MovieID</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4.146846</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>2077</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3.201141</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>701</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.016736</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>478</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2.729412</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>170</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>3.006757</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>296</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         rating_mean  rating_min  rating_max  user_count\n",
       "MovieID                                                 \n",
       "1           4.146846           1           5        2077\n",
       "2           3.201141           1           5         701\n",
       "3           3.016736           1           5         478\n",
       "4           2.729412           1           5         170\n",
       "5           3.006757           1           5         296"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result.columns = [\"rating_mean\", \"rating_min\",\"rating_max\",\"user_count\"]\n",
    "result.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 方法3：使用groupby之后apply对每个子df单独统计"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>rating_mean</th>\n",
       "      <th>rating_min</th>\n",
       "      <th>rating_max</th>\n",
       "      <th>user_count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>MovieID</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4.146846</td>\n",
       "      <td>1.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>2077.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3.201141</td>\n",
       "      <td>1.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>701.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.016736</td>\n",
       "      <td>1.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>478.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2.729412</td>\n",
       "      <td>1.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>170.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>3.006757</td>\n",
       "      <td>1.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>296.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         rating_mean  rating_min  rating_max  user_count\n",
       "MovieID                                                 \n",
       "1           4.146846         1.0         5.0      2077.0\n",
       "2           3.201141         1.0         5.0       701.0\n",
       "3           3.016736         1.0         5.0       478.0\n",
       "4           2.729412         1.0         5.0       170.0\n",
       "5           3.006757         1.0         5.0       296.0"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def agg_func(x):\n",
    "    \"\"\"注意，这个x是子DF\"\"\"\n",
    "    \n",
    "    # 这个Series会变成一行，字典KEY是列名\n",
    "    return pd.Series({\n",
    "        \"rating_mean\": x[\"Rating\"].mean(),\n",
    "        \"rating_min\": x[\"Rating\"].min(),\n",
    "        \"rating_max\": x[\"Rating\"].max(),\n",
    "        \"user_count\": x[\"UserID\"].nunique()\n",
    "    })\n",
    "\n",
    "result = df.groupby(\"MovieID\").apply(agg_func)\n",
    "result.head()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
